

## Research Journal of Pharmaceutical, Biological and Chemical

### Sciences

# An Efficient Architecture for Adaptive Digital FIR Filter Implementation Using LMS Algorithm.

### Senthil Singh C<sup>1</sup>, Priyanka V<sup>2\*</sup>.

<sup>1</sup>Professor, Adhi College of Engineering and Technology, India. <sup>2</sup>Assistant Professor, KCG College of Technology, India.

#### ABSTRACT

Digital finite impulse response (FIR) filters are the basis for various digital signal processing applications. Adaptive signal processing applications such as channel equalization, acoustic echo cancellation, system identification and so on. The basic operation needed to implement a FIR filter is the signed multiply-and-accumulate (MACS), which is traditionally performed using a hardware multiplier peripheral in any DSP device. In a real-time digital filter algorithm, the computation and memory-move operations have to be completed within one sample period. The number of computations performed within a single sample period is based on the number of taps of the filter, i.e., the order of the filter. This limits the performance of processors to handle a real-time FIR filter algorithm only at low sample rates and with a reduced number of filter taps. Therefore, efficient architectures have to be designed with mainly focus on necessity of high flexibility, low complexity and high speed. This paper presents an efficient adaptive LMS FIR filter architecture with a single multiplier and adder irrespective of number of taps, using the concept of time sharing multiplier architecture. It also deals with the OPC scheme and parallel pipelined multiplier architectures is used to optimize the performance of the filter. The results are validated using FPGA in Loop (FIL), by simulating in MATLAB/Simulink-xPC target tool box. A reduction in complexity and increase in speed was achieved. The performances of these architectures are compared with conventional multichannel filters and also with other existing architectures.

Keywords: Digital filter, Multiply and Accumulate (MAC), Output Product Coding (OPC), Time sharing multiplier





#### INTRODUCTION

Finite Impulse Response (FIR) filtering is one of the most widely used operations for low power and high speed digital signal processing (DSP) systems due to the rapid growth in mobile computing and portable multimedia applications<sup>[1-3]</sup>. One of the most widely used operations performed in DSP is. FIR filters play a crucial role in many signal processing applications such as digital communication, speech processing and seismic signal processing. A wide variety of tasks such as spectral shaping, matched filtering, interference cancellation, channel equalization, etc. can be performed with these filters <sup>[9]</sup>. With the advent of recent DSP technology, large sampling array size is mandatory for a variety of applications such as communication and multimedia, in which the information from single channel may be erroneous and also time consuming <sup>[4-5]</sup>. Hence, multi-channel signal processing is desirable for reliable and efficient processing of these signals. The real-time implementation of the higher order filters is a challenging task as the number of multiply-accumulate (MAC) operations required per filter output increases linearly with the filter order.

The adaptive filters are extensively used in several signal processing applications such as channel equalization, acoustic echo cancellation, interference cancellation, system identification and so on <sup>[6]</sup>. In the analysis of adaptive filter structures, there was a reduction in the speed of operation and increase in the complexity when number of taps increases. <sup>[7-8]</sup> Several attempts have been made to develop low-complexity, dedicated VLSI systems for these filters. Therefore, efficient architectures have to be designed with mainly focus on necessity of high flexibility, low complexity and high speed. Among many adaptive algorithms, probably the most known is the Least mean squared (LMS) <sup>[10]</sup> has a very low computational complexity which makes it very attractive for practical implementation.

#### PROBLEM IDENTIFICATION

It is evident that plenty of research work has been taken up for the analysis of area, delay and power of various FIR filter architectures schemes. Researchers have focused more on reducing the hardware complexity and minimizing power consumption, speed of operation plays a more vital role while designing the channel filters as the filter has to be operated at high sampling frequencies. This issue has been addressed by the proposed method called adaptive FIR filter architecture with time sharing multiplier architecture across single MAC core by increasing output filter operating frequency. While analyzing adaptive filter structures, it was seen that there was a reduction in the speed of operation and increase in the complexity, due to the increase in the number of taps. These two problems are addressed in this proposed work by using the concept of pipelining and time sharing architecture across single multiplier and adder irrespective number of taps.

#### **PROPOSED WORK**

This section describes an efficient adaptive LMS FIR filter architecture using the concept of time sharing multiplier architecture. Optimization of the filter performances using OPC scheme and parallel pipelined multiplier architecture are also dealt.

#### Efficient Adaptive FIR Filter Architecture Using LMS Algorithm

Adaptive filter also has the same kind of implementation as that of a fixed filter with the usage of delay elements, adder, and multiplier. The adaptation algorithm used in this paper is the LMS algorithm which has an acceptable convergence rate with smaller final error after convergence. The LMS algorithm starts by calculating the filter output when input samples of a signal align with the filter. Next the estimated received signal is obtained and subtracted from the reference signal (d (n)) as shown in equation (3.1). The difference is used to calculate the new filter coefficients. The equation governing the LMS algorithm is given as shown in equation (3.2)

| e(n) = d(n)-Y(n)       | (3    | 3.1) |
|------------------------|-------|------|
| w(n+1) = w(n) - μΔε[n] | (3.2) |      |

The equation (3.2) is called as the weight update equation. The parameter  $\mu$  is called as the step size and the value  $\epsilon$ [n] represents the mean square error value. In conventional adaptive FIR filter architecture, while increasing the number of taps, number of multipliers is also increased. In order to overcome these

March – April 2017 RJPBCS 8(2) Page No. 569



difficulties, the efficient architecture is proposed with time sharing multiplier across the single MAC core by increasing the output frequency. The filter operating frequency is increased by inserting pipelined registers.

The proposed adaptive FIR architecture for 2-tap is shown in figure 3.1. The multiplexer is used to select the data across the delay registers and perform the multiplier operation with the filter coefficients stored in registers  $C_0$ \_reg and  $C_1$ \_reg. Accumulator block is used to add the previous data value to the present data value and is set to zero after 2 clock cycles. Multiplexer Select lines, and accumulator operation is controlled by a generic counter. Similarly N number of taps can be implemented using single MAC core by using appropriate delay registers, multiplexer and counter. The error signal is the difference between desired signal 'din' and filter output 'Y\_out' and is considered as input to update the filter coefficients. The error signal is multiplied with a step index of  $\mu$  for better convergence and the resultant is multiplied to x\_in to simultaneously update the filter coefficients  $C_0$ \_reg and  $C_1$ \_reg.



Figure 3.1 Proposed adaptive FIR filter architecture

The performance of the filter is further enhanced by the introduction of two multiplier schemes which reduces the complexity and critical path. The parallel pipeline multipliers are used in systems where increased performance is required. In this multiplier, the pipelined registers are inserted to increase the speed of the operation by changing the combinational logic into sequential. By means of OPC based L-bit multiplier, the number of memory locations needed to store partial products is reduced from 2<sup>L</sup> to 2<sup>L-2</sup>. Hence the memory size is greatly reduced as compared to ODD schematic based multiplication, which leads to reduction in complexity.

#### **RESULT AND DISCUSSION**

The proposed adaptive FIR filter architectures are implemented using two methods, namely, OPC multiplier and parallel pipelined multiplier. FIL simulation is done using MATLAB/Simulink-xPC target tool box connected to Altera FPGA board. This helps in testing the models in real time. As shown in figure 4.1 a 400 Hz

**RJPBCS** 

8(2)

Page No. 570

2017



sine wave is used as the input signal of adaptive FIR filter. Figure 4.2 shows FPGA IN LOOP (FIL) simulation results obtained through Adaptive FIR filter implementation with the time domain input/output curves of a 100 Hz desired signal (dut\_ref), error signal and output signal. The FIR filter coefficients are updated using LMS

algorithm with the step index of  $\mu$  = 0.06. It can be noted that the output signal is almost converged to the desired signal.



Figure 4.1 Input signal of adaptive FIR filter

Figure 4.2 FIL Simulation

Table 4.1 compares the synthesis results of 16 tap filter architectures with existing architectures synthesized using Xilinx Virtex-5 FPGA device. The Minimum Sampling Period (MSP), Maximum Sampling Frequency (MSF), and Number Of Slices (NOS) are listed in Table 4.1. Both the proposed architectures provide maximum sampling frequency and area efficiency compared to the other existing architectures by the use of pipelined registers in between the multiplier and adder structure. In OPC based adaptive filter architecture high speed is achieved due to greater LUT access time than the normal multiplication time. The pipelined multiplier architecture achieves low complexity due to the bit product matrix.

| Table 4.1 Comparison of proposed adaptive FIR filter with other archit | ectures |
|------------------------------------------------------------------------|---------|
|------------------------------------------------------------------------|---------|

| Design                                               | MSP(ns) | MSF(MHz) | NOS | SREG | SLUT |
|------------------------------------------------------|---------|----------|-----|------|------|
| Meher                                                | 17.35   | 57       | 178 | 412  | 267  |
| LogiCORE IP FIR Compiler Xilinx                      | 3.96    | 252      | 368 | 970  | 806  |
| Sang Yoon Park [R=1]                                 | 2.27    | 440      | 310 | 1120 | 755  |
| Sang Yoon Park [R=2]                                 | 5.11    | 195      | 205 | 671  | 517  |
| Sang Yoon Park [R=4]                                 | 10.91   | 91       | 126 | 397  | 277  |
| Proposed Design with OPC                             | 3.096   | 323.043  | 220 | 352  | 1488 |
| Proposed Design with parallel pipeline<br>Multiplier | 7.092   | 141.002  | 164 | 338  | 374  |

#### Table 4.2 Comparison of proposed adaptive FIR filter with existing architectures

| Design                                            | Total Logic Elements |      |
|---------------------------------------------------|----------------------|------|
| Number of taps                                    | 16                   | 32   |
| Daniel J.Allred [k=2]                             | 1309                 | 2244 |
| Proposed Design with OPC                          | 1240                 | 1704 |
| Proposed Design with parallel pipeline Multiplier | 679                  | 1142 |

The comparison of 16- tap and 32-tap proposed Adaptive FIR filter with existing architecture synthesized using the FPGA device in Table 4.2 shows that the proposed architectures achieved less number of logic elements compared to existing architecture due to single MAC core filter structure.

#### CONCLUSION

In this paper, the proposed method utilizes efficient schemes for high-throughput single MAC based implementation of adaptive FIR digital filters. It is shown that the hardware cost could be substantially reduced

8(2)



by time sharing multiplier architecture across single MAC core. The proposed adaptive FIR filter design resulted in substantial area reduction compared to the conventional adaptive FIR filter architectures for the FPGA implementation. The proposed structure of 16-tap adaptive FIR filter for FPGA implementation supports up to 323 MHz input sampling frequency. Thus efficient adaptive FIR filter architecture achieves high speed, low complexity and the flexibility of FPGA technology making the architecture a viable alternative to the development of reconfigurable hardware for real time signal processing applications.

#### REFERENCES

- [1] Meher, P.K&Park, S.Y, 'High-throughput pipelined realization of adaptive FIR filter based on distributed arithmetic', in Proc. IEEE/IFIP19th Int. Conf. VLSI-SOC, Oct.2011,pp. 428–433.
- [2] Sang Yoon park & promod kumar meher, July 2014, 'Efficient FPGA and ASIC Realizations of a DAbased Reconfigurable FIR Digital Filter', IEEE Transactions on Circuits and systems-II: Express Briefs, VOL.61, No.7.
- [3] LogiCORE IP FIR Compiler v5.0, Xilinx, Inc., San Jose, CA, USA, 2010.
- [4] Pramod Kumar Meher, March 2010, 'New Approach to Look Up Table Design and Memory-Based Realization of FIR Digital Filter', IEEE Transactions on circuits and systems irregular papers, Vol. 57, No. 3.
- [5] Pramod Kumar Meher, Chandrasekaran, S & Amira, A, Jul.2008, 'FPGA realization of FIR filtersby efficient and flexible systolization using distributed arithmetic', IEEE Trans. Signal ProM cess., vol. 56, no. 7, pp. 3009–301
- [6] Park, J, et al., Feb. 2004, 'Computation Sharing Programmable FIR Filter for Low-Power and High-Performance Applications', IEEE J. Solid stateCir. Sys., vol.39, no.2, pp.348-357
- [7] Tsao Y.C & Choi K, Jun. 2012, 'Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filters of Odd Length Based on Fast FIR Algorithm'IEEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 59, no. 6, pp. 371–375.
- [8] Cowan C. F. N. & J. Mavor, August 1981 "New digital-adaptive filter implementation using distributed-arithmetic techniques," IEE Proceedings, vol. 128, Pt. F, no. 4, pp. 225–230
- [9] Sang Yoon Park & Pramod Kumar Meher, June 2013 "Low-Power, High-throughput, and Low Area Adaptive FIR Filter Based on Distributed Arithmetic", IEEE Transcantions on Circuits and Systems-II:Express Briefs, Vol. 60, No. 6.
- [10] Allred D. J. Yoo H, Krishnan.V, Huang.W & Anderson D. V. Jul. 2005"LMS adaptive filters using distributed arithmetic for high throughput," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 1327–1337.

8(2)